wo 2007/086042 



PCT/IL2006/000100 



CLAIMS 

What is claimed is: 

1 . A speaker segmentation method for associating an at least one segment 
for each of at least two sides of an at least one audio interaction, with one 
of the at least two sides of the interaction using additional information, 
the method comprising: 

a segmentation step for associating the at least one segment with 
one side of the at least one interaction; and 

a scoring step for assigning a score to said segmentation. 

2. The method of claim 1 wherein the additional information is at least one 
of the group consisting of: computer-telephony-integration information 
related to the at least one interaction; spotted words within the at least 
one interaction; data related to the at least one interaction; data related to 
a speaker thereof; external data related to the at least one interaction; or 
data related to at least one other interaction performed by a speaker of the 
at least one interaction. 

3. The method of claim 1 further comprising a model association step for 
scoring the at least one segment against an at least one statistical model 
of one side, and obtaining a model association score. 

4. The method of claim 1 wherein the scoring step uses discriminative 
information for discriminating the at least two sides of the interaction. 

5. The method of claim 4 wherein the scoring step comprises a model 
association step for scoring the at least one segment against an at least 
one statistical model of one side, and obtaining a model association 
score. 

6. The method of claim 5 wherein the scoring step further comprises a 
normalization step for normalizing the at least one model score, 

7. The method of claim 4 wherein the scoring step comprises evaluating the 
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association of the at least one segment with a side of the interaction using 
additional information. 

8. The method of claim 7 wherein the additional information is at least one 
of the group consisting of: computer-telephony-integration information 
related to the at least one interaction; spotted words within the at least 
one interaction; data related to the at least one interaction; data related to 
a speaker thereof; external data related to the at least one interaction; or 
data related to at least one other interaction performed by a speaker of the 
at least one interaction. 

9. The method of claim 1 wherein the scoring step comprises statistical 
scoring. 

10. The method of claim 1 further comprising: 

a step of comparing said score to a threshold; and 
repeating the segmentation step and the scoring step if said score 
is below the threshold. 

11. The method of claim 10 wherein the threshold is predetermined, or 
dynamic, or depends on: information associated with said at least one 
interaction, information associated with an at least one speaker thereof, 
or external information associated with the interaction. 

12. The method of claim 1 wherein the segmentation step comprises: 

a parameterization step to transform the speech signal to a set of 
feature vectors in order to generate data more suitable for statistical modeling; 

an anchoring step for locating an anchor segment for each side 
of the interaction; and 

a modeling and classification step for associating at least one 
segment with one side of the interaction 

13. The method of claim 12 wherein the anchoring step or the modeling and 
classification step comprise using additional data. 

14. The method of claim 13 wherein the additional data is one or more of the 

group consisting of: computer-telephony-integration information related 
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to the at least one interaction; spotted words within the at least one 
interaction; data related to the at least one interaction; data related to a 
speaker thereof; external data related to the at least one interaction; or 
data related to at least one other interaction performed by a speaker of the 
at least one interaction. 

15. The method of claim 1 further comprising a preprocessing step for 
enhancing the quality of the interaction. 

16. The method of claim 1 further comprising a speech/non-speech 
segmentation step for eliminating non-speech segments from the 
interaction. 

17. The method of claim 1 wherein the segmentation step comprises scoring 
the at least one segment with a voice model of a known speaker. 

18. A speaker segmentation apparatus for associating an at least one segment 
for each of at least two speakers participating in an at least one audio 
interaction, with a side of the interaction, using additional information, 
the apparatus comprising: 

a segmentation component for associating an at least one segment 
within the interaction with one side of the at least one interaction; and 
a scoring component for assigning a score to said segmentation. 

19. The apparatus of claim 18 wherein the additional information is at least 
one of the group consisting of: computer-telephony-integration 
information related to the at least one interaction; spotted words within 
the at least one interaction; data related to the at least one interaction; 
data related to a speaker thereof; extemal data related to the at least one 
interaction; or data related to at least one other interaction performed by a 
speaker of the at least one interaction. 

20. A quality management apparatus for interaction-rich environments, the 
apparatus comprising: 
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a capturing or logging component for capturing or logging an at 
least one audio interaction; 

a segmentation component for segmenting the at least one audio 
interaction; and 

5 a playback component for playing an at least one part of the at 

least one audio interaction. 
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